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^ ■ The Johnson-Lindenstrauss Lemma allows for the projection of n 

^. points in p— dimensional Euclidean space onto a fc— dimensional 

^2 ■ Euclidean space, with k > -^^^, so that the pairwise distances 

are preserved within a factor of 1 ± e. Here, working directly with 
the distributions of the random distances rather than resorting 
to the moment generating function technique, an improvement on 
the lower bound for k is obtained. The additional reduction in 

|— ^ ! dimension when compared to bounds found in the literature, is 

at least 13%, and, in some cases, up to 30% additional reduction 
is achieved. Using the moment generating function technique, we 
further provide a lower bound for k using pairwise L2 distances 
in the space of points to be projected and pairwise Li distances 
in the space of the projected points. Comparison with the results 

^ I obtained in the literature shows that the bound presented here pro- 

^ ' vides an additional 36 — 40% reduction. 

§ ■ 1 Introduction 



-I— > 

C/2 



X 



With the arrival of the "small n, large p" paradigm, dimension reduction 
methods have come to the forefront of many applications. For example, in 
survival analysis studies using microarray data, in the order of 10-20K ex- 
pressions per patient are collected. On the other hand, usually only a few 
hundred patients are available for the study. For this reason, one must re- 
duce the dimension of the gene expression data matrix, before embarking 
on any type of analysis. The challenge of "small n, large p" arises also in 
high throughput molecular screening, astronomy, and image analysis. Among 
the various dimension reduction techniques that are used - some new, some 
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old - the Random Projection method has attracted a lot of attention lately. 
Random Projection (RP) provides a computational method for dimension re- 
duction whereby the original p— dimensional data points are projected onto a 
fc— dimensional subspace by multiplying the n x p data matrix X hj a, p x k 
random matrix F. In matrix notation, 

T = XT (1) 

where X is the n x p data matrix, T is a p x k random matrix, and T is 
the resulting n x k matrix consisting of the projected points onto a lower 
fc— dimensional subspace. 

Orthogonality of the projection matrix preserves similarities, e.g. the in- 
ner product or Euclidean distance, of the original vectors when projected to 
the low- dimensi onal space. Altho ugh the random matrix F is not orthogo- 
nal, Achlioptas lAchlioptad (120011 ) pointed out that the loss of information 
is minimal because the orthogonal property is achieved with high probabil- 
ity in high-dimensional space. Moreover, the random projecti on matrix F is 



I.e. 



irr- 



/„ Hecht-Nielsen 



close to orthogonal in high- dimensional space 

( 119941 ) ■ Using Random Projections, orthogonalization of the projection matrix 
in high-dimensio nal space ca i i be a v oided witho ut losing much information in 
the original data lAchhopta^ (l200lh : lOoell (l2005h . 



Random P rojection (RP) has been used in numerous areas such as in ma- 



chine le arning lArriaga and Vempalal(ll999f): iBingham and Mannilal (12001 ) : lCandes and Tao 
(l2006h : |Dasguptal(l2000f ): lFern and Brodlevl(l2008h:lFradkin and MadiganI (l2002h : 
Kaskj ( 1998), lat e nt se mantic indexing 'Papadimitrio u et al\ ( 19981 ): iKurimol 



( 1999): IV empalal ( 11998 ), face r ecognition iGoeL (i2005l ). kernel computations 



Achlioptas2 (20011) :lBluml (120051). nearest neigh b or qu eries iDeegalla and Rostrum 



(120061 ) ; iKleinbergl (119971): llndyk and Motwanil (119981 ) , privacy preserving dis- 

tributed data mininglLiu et a/, j 20061 ) . gene expression clus tering Bertoni and Valentini 
( 2005 1. I2OO6I ) ; ISertoni et flll(l2008l ). and finding DNA m otifs iBuhler and Tomrn 
20021) ■ A good ove rview on the use of RP is given in iBingham and Mannila 



Goell (|2005[ ). While several dimension reduction methods obtain the 
low- dimensional subspace by optimizing a certain criteria, RP does not. For 
example. Principal Component Analysis (PGA) finds the set of directions that 
maximize the variance in the data. It turns out that the performance of RP is 



compara ble to PGA in face recognitio n experiments iGoell (l2005l). text and im- 
age da ta lBingham and Mannilal (120011 ). and machine learning iFradkin and Madigan 
(I2OO2I ). In addition, although PGA is a popular dimension reduction method, 
it is quite expensive computationally since it involves computing th e eigenvalue 
decomposition of the data covariance matrix iMannila et al\ (120021 ). RP, on the 
other hand, is simple and computationally efficient. The computing cost for 



PCA is 0{np'^) + O(p^), while that of RP is 0{k^p) when the entries to the 
random matrix are independent and identically distributed (i.i. d.) standard 



Gaussians, and 0(kp) when the entries are of Achlioptas type ([2]) iGoell (120051 ): 
Li et all fl2006h : iBingham and Mannilal fl200lh . 

The main motivation for Random Projection (RP) is the Johnson-Lindenstrauss 
Lemma (1984), which states that a set of n points in p— dimensional Euclidean 
space can be mapped down onto a k = 0(lnn/e^) dimensional Euclidean space 
such that the pairwise distance between any two points is preserved within a 
factor of (1 ± e) for any < e < 1. We should note that the similarity mea- 
sure used in the JL Lemma is the Euclidean dis tance. In the original proof 
of the JL Lemma, Johnson and Lindenstrauss (IJohnson and Lindenstrausd . 



19841 ) show that such a ma pping is provided by a rand om orthogonal projec- 
tion. Frankl and Maehara fJFrankl and Maeharal . Il988l ) simplified the original 
proof of Johnson and Lindenstrauss using geometric techniques, and provided 
an improv ement on the lower b ound for k, i.e. k > [^lr|^] + 1- Indyk and 
Motwani ( llndyk and Motwanil . Il998l ) simplified the proof of the JL Lemma 
using i.i.d. standard Gaussian entries for the ran dom matrix P. Also, using a 
Gaussian random matrix, Dasgupta and Gupta ( jDasgupta and Guptal . 119991 ) 
further simplified the proof with elementary probabilistic techniques based on 
moment generating functions, and improved on the lower bound for k to be 



k> 



24 Inn 



3e2-2e3- 

Instead of improving the lower bound for k, several papers in the litera- 
ture focus o n improving the c omputational time of the Random Projections. 
Achlioptas fJAchlioptasl . l200ll ) proposed two simpler distributions as alterna- 
tives to using a Gaussian random matrix: 



fij 



and 



v^ 



-1 with prob. 1/2 
1 with prob. 1/2 

with prob. 1/6 
with prob. 2/3 
with prob. 1/6 



(2) 



(3) 



These distributions are easy to implement, and the computational time is 
greatly reduced. With the distribution in ([2]), only 1/2 of the operations are 
needed which implies a 2-fold speedup. Similarly, a 3-fold speedup is obtained 
for the distribution in ([3]). The random matrix defined through ([2]) and ([3]) 



can be generalized as follows: 



ij 



y/q 



+1 





with prob. -^ 
with prob. 1 - 



(4) 



—1 with prob. 



2g 



Thus, q = 
3 (e.g. q 



1 yields ([2]), and g = 3 yields Q. Furthermore, using q ^ 
= ^yp or q = j^) can significantly speed up the computation 



f^Li et ali l2006h sinc e the r andom matrix T is very sparse. Ailon and Chazelle 
( lAilon and Chazelld . l2006l ) extended the idea of using sparse random matri- 
ces with a randomized Fourier transform to speed up the RP. The running 
time of the Ailon and Chazelle algorithm is improved by using a 4-wise in- 
depe ndent deterministic code matrix with randomized block diagonal ma- 
trix ( lAilon and Liberty! . 120081 ) , and by using any determ ir iistic matrix with 
tensor products and Lean Walsh Transform (jAilon et all 120081 ). Matousek 
(JMatousekl . 120071 ) provided a version of the JL Lemma that allows the entries 
of the random matrix F to be arbitrary independ ent random vari ables with 
zero mean, unit variance and subgaussian tail (see f JMatousekl . 120071 ) for a dis- 
cussion on the variants of the JL Lemma). All these improvements on the 
time needed to obtain the random projection, however, do not improve on the 
lower bound for k. 

We adopt the following notation to use throughout the paper. Denote by 0(.) 
and $(.) the standard Gaussian density and cumulative distribution functions, 
respectively. Denote by L2-L2 RP the random projection that uses L2 distances 
in the space of points to be projected and L2 distances in the space of the 
projected points, and L2-L1 RP the random projection that uses L2 distances 
in the space of points to be projected and Li distances in the space of the 



projected points. For x G R^, let ||x||-^ = Y^^= 



X j 



and 



E? 



x? 



In this paper, the JL Lemma is revisited. In p articular, an improveraent on 
the lower bound for k from Dasgupta and Gupta fJDasgupta and GuptaL Il999[ l 
is provided by working directly with the exact distribution of the random Eu- 
clidean distances rather than using the moment generating function approach. 
The additional reduction provided by our results is at least 13% in all cases, 
and in other cases an additional reduction of 30 % is possible when cornpare d 
to the bound obtained in Dasgupta and Gupta ( JDasgupta and Guptal . Il999l ). 
The JL Lemma uses Euclidean distance to measure the distortion in the dis- 
tances of the projected points when projecting from p— dimensional Euclidean 
space onto A;— dimensional Euclidean space. This paper also obtains a lower 
bound for k for the L2-L1 RP. A lower bound for k using random matrices as 
in (j4]) with g = 1, 2, 3 is also provided for the L2-L1 RP. This improved lower 



bound provides an additional 36 — 40 % reduction in di mension when compared 
to the results obtained in Matousek fJMatousekl . 120071 ) . 

The paper is organized as follows: section 2 discusses the JL Lemma in 
detail. The improvement to the JL bound for the L2-L2 RP is discussed in 
section 3. A lower bound for k using L2-L1 RP is provided in section 4, and 
section 5 provides some concluding remarks. Most of the technical proofs are 
relegated to an appendix. 



2 Johnson-Lindenstrauss Lemma 



In th eir pioneering work, Johnson and Lindenstrauss (jJohnson and Lindenstrauss 
19841 ) provided the following result: 

Johnson-Lindenstrauss (JL) Lemma For any < e < 1 and integer n, 
let k he such that k = 0(lnn/e^). For any set V of n points in R^, there is a 
linear map f : R^ — ?■ R'^ such that for any u, v G V, 



(1 



u 



l|2 ^ 
V < 



|f(u)-f(v)||^<(l + e) 



u 



(5) 



Johnson and Lindenstrauss fjjohnson and Lindenstrauss! . 1 19841 ) showed that 
the linear map / can be taken to be a random orthogonal pr ojection, but the 
expli cit construction of / is not pr ovided. Indyk and Motwani (llndyk and Motwanil . 
19981 ) and Dasgupta and Gupta (JDasgupta and Guptal . Il999l ) gave an explicit 
form of the mapping / in their versions of the JL Lemma. The mapping is 
provided by /(x) = xF, where x G V^ and where entries of the random matrix 
r are i.i.d. standard Gaussians. In a remarkable paper using only elernentary 
probabilistic techniques, Dasgupta and Gupta (JDasgupta and Guptal . Il999l ) 
improved on the lower bound for k from the original JL Lemma as follows. 
Dasgupta and Gupta version of the JL Lemma: For any < e < 1 
and integer n, let k he such that 



k> 



24 Inn 
3e2 - 2e3 



For any set V of n points in IV, there is a linear map f : R^ -4 R'^ such that 
for any u, v G V, 



P [(1 - e) ||u - v||' < ||f(u) - f(v)||' < (1 + e) ||u - v||'] > 1 



n^ 



Let X = u — V. Since / is linear, the inequality in (E]) is equivalent to 



P 0|f(x)||^ > (1 + e) ||x||^] + P [||f(x)||^ < (1 - e) ||x|H < 



n^ 



(6) 



(7) 



The bound in ([7]) can be obtained by separately bounding the left- and 
right-tail probabilities. That is, by finding / so that simultaneously, 



P[||f(x)ir>(l + e)||x||^]<l, (8) 



and 

P0|f(x)ir<(l-6)||x||^]<l. (9) 

The proof of Dasgupta and Gupta's version of the JL Lemma hinges on the 
use of standard Gaussians as entries to the random matrix F, and the moment 
generating function technique. The proof is sketched next, as this will set 
down the notation and facilitate the reading of section 3. 
Sketch of the proof of Dasgupta and Gupta's version of the JL 
Lemma 

Let r to be a random matrix of dimension p x k with entries rij ~ A^(0, 1) 
independent. For x G V^, define /(x) = t^xF, and y = V^tj^- Then 

i/j = Si- ~ N{0, 1) and y^ ~ Xi with P(||y|| ) = k, where Vj is the j*'^ column 
ofF. 

Let «! = k{l + e), 0:2 = k{l — e). Then the right-tail probability is bounded 
by 

PO|f(x)ir>(l + e)||x||^] = 

< 



P [ y ^ > A;(l + e)" 


(10) 


(e-^(^+^)p(e^^?))' , 


s>0 (11) 


e-«"i(l-2s)-'=/2 , 


se (0,1/2) (12) 



where the inequality in (llip follows from Markov's inequality and the fact that 
the j/j's are i.i.d. Similarly, the left-tail probability is bounded by 

Pn|f(x)||^<(l-e)||x||^ 



] = P[|y||'<A;(l-e)] 


(13) 


< y^^-'^E{e-'y^y , 


s > (14) 


< e-^"^(l-2s)-'^/2 ^ 


SG (0,1/2) (15) 



where the inequality in (TT5|) follows from the fact that e''/(l + 2s) is decreasing 
in s G (— |, I), and hence ^j^^ < '^ ^_2J for s G (0, 1/2). The tightest bound 
in flT2l) . and hence in flTSl) also, is obtained by minimizing with respect to s. 
The minimizing s* = | (j^) G (0, 1/2). Since g{s) = e-'^^+'\l-2sy^/^ , s G 
(0,1/2), is strictly convex, s* is the unique minimizer of f lT^ . Plugging s* 



back into flT2|) yields 

P[||y||'>«i] < exp(-^{e-\n{l + e))] (16) 

< exp(-A(3e2_2e^)) , (17) 

where ( !T7|) is obtained after using the inequahty ln(l + e) < e — y + y. 

The same bound is obtained for the left-tail probability. Thus, when 
k > ^1^, then both P [I |f (x) 1 1^ > (1 + e) | |x| |^] and P [| |f (x) 1 1^ < (1 - e) | |x| |^] 
are bounded by 1/n^. 

3 Improvement on the bound provided by the 
JL Lemma. 

In the previous proof, the left- and right-tail probabilities are bounded by 
using Markov's inequality. The bound can be improved by working directly 
with the exact probability distribution of the random Euclidean distances. 
The following Lemma (proof is in the Appendix) is key to proving the main 
result of this section. 

Lemma 3.1. Let k be an even integer, and < e < 1. Let Ai = k{l + e)/2 
and d = k/2. Then 

»'*•'' = ^"" (ft <i«) 

is a decreasing function in k. 

The lower bound for k can then be obtained from the following Theorem. 

Theorem 3.2. For any < e < 1 and integer n, let k he the smallest even 
integer satisfying (^) g{k,e) < ^. Then, for any set V of n points in 'W , 
there is a linear map f : R^ — t- R'^ such that for any u, v G V, 

P[(l-e)||u-v||'<||f(u)-f(v)||'<(l + e)||u-v||'] >^-^ ■ (19) 

The lower bound for k can be obtained numerically by finding the smallest 
even integer k satisfying the inequality (^t^) g{k,e) < ^. 
Next, we provide the proof to Theorem 13. 2[ 



Proof of Theorem [372} Recall the well-known Gamma-Poisson Relationship: 

Suppose X ~ Gainina{d,l), and Y ~ Poisson{x). Then we have P{X > 
x) = PiY < d-1). That is, 



-z 






for d = 1, 2, 3, ... . 

Since ||y|| = X]i=i?/j ^ xl = Gamma{k/2,2), using (12CT]) with ai = 
k{l + e), and setting d = k/2, the right-tail probability can be written as, 

P|llyll^>a.l^e-"/^i:^. 

y=o ^ 

and with 0^2 = k{l — e), the left-tail probability can be written as, 

P(||y|r<a2) = e--/^V^^. (21) 

We introduce the following Theorem (proof is in the Appendix), which is 
essential in establishing the bound for the tail probabilities. 

Theorem 3.3. Let d be a positive integer, 
a) Let 1 < d < Xi. Then, 



b) Let 0< X2<d. Then, 

y\ ^ \d-X2J \{d-l)\ 



y=d 



Using Theorem 13. 3t with Ai = ai/2 = k{l + e)/2 and d = k/2, the right-tail 
probability is bounded as follows 



Theorem 13.31 that 



For the left-tail probability, setting A2 = ^2/2 = /c(l — e)/2, it follows from 

(25) 



^[||ylr<a2] 



^ y\ 

y=d ^ 



< 



< 



e 

1 + e 



\d-l 
^2 



{d 



-A2 



K 



d-i 



id-l] 



(26) 
(27) 



where the last inequality follows since e'^^ '^^ < ( y" ) • Note that the bound 
for the left-tail probability is the same as that for the right-tail probability. 
Thus, 



P[||y||' > «i] + P[||y||' < «2] < 2 ( ^^ ) g{k,e) 



{2i 



For a given e, we can obtain the lower bound for k by numerically obtaining 
the smallest even integer k such that {^^^ g{k, e) is less than or equal to 1/ 

D 



n^. 



A numerical comparison of the two bounds is presented in Table [T] and will 
be discussed in detail in section 5. 

The Johnson-Lindenstrauss (JL) Lemma states that a set of n points in 
any Euclidean space can be mapped to a Euclidean space of dimension k = 
0(lnn/e^) such that the pairwise distance between the points are preserved 
within a factor of 1 ± e. Since the Li distance is more robust against outliers 
than the L2 distance, it is of interest to explore the effect of Random Pro- 
jection on dimension reduction using the Li norm. In other words, a linear 
mapping for a set of n points from p— dimensional space to A; = 0{lnn/e'^) 
dimensional space is desirable so that the pairwise Li distances between the 
points are preserved wit hin a factor of 1 ifc e. However, due to the results of 
Brinkman and Charikar (JBrinkman and Charikarl. 120031) . Charik ar and Sahai 
( Charikar and Sahail . |2002| ). Lee and Naor (JLee and Naorl . |200J), and Indyk 



(jindvkl . I2OO6I ) . the J L Lemma canno t be extended to the Li norm using a linear 



mapping. Li et al. (JLi et all 120071 ) proposed three nonlinear mappings (bias- 
corrected sample median, bias-corrected geometric mean, and bias-corrected 
maximum likelihood mappings) using Li norm with standard Cauchy as en- 
tries to the random matrix, and obtained k = 0(lnn/e^). 

Although it is not possible in the case of a linear mapping to obtain a totally 
satisfying result when the Li norm is used to measure distances in both the 
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space of points to be projected and the space of projected points, it is possible 
to obtain good results by using the L2 norm in the space of points to be 
projected and the Li norm to measure distance between the projected points, 
as discussed next. 



4 L2—L1 norm with Gaussian Random Matrix 

Here a theorem for the linear projection of n points in p— dimensional space 
onto a A;— dimensional space using i.i.d. standard Gaussians as entries of the 
random matrix F is presented where the L2 norm is used as a distance in 
the original space, and Li is used as a distance in the /c— dimensional tar- 
get space. It turns out that the original L2 pairwise distances are within a 
factor of (1 ± e)\plpK of the proj ected Li distances. For t he same factor of 



(1 ± e) y^/vr, Ailon and Chazelle ( Ailon and Chazelld . 120061 ) (sparse Gaussian 
random matrix with fast Fourier transform) and Matousek (JMatousekl . 120071 ) 
(sparse Achlioptas-typed random matrix) obtain the lower bound for k to be: 



A;>Ce-2(21n(l/(5)) 



(29) 



where b G (0, 1), e G (0, 1/2), and C is a sufficiently large constant. Here, b 
is a parameter that relates to the probability with which any two projected 
points remain within (1 ± e)^/2/^: of the L2 distance of the original points. 
Although the multiplic ative constant C is not provided, it was taken to be 1 
in one of the proofs in flMatousekl . l2007h . When 5 = l/ra^, then k = (^) . 



The following Theorem gives a n improvement on the low er bound for k 
provided by Ailo n and Chazelle (jAilon and Chazelld . l2006l ) and Matousek 
flMatousekl . EoOTh . 

In what follows, for s > 0, let A{s) = 2e-"v^(i+^)+«'/2$(^5)_ p^^ a given 
e G (0, 1), let s*(e) be the value that minimizes A{s). Equivalently, let s* be 
the unique solution to s = \/lpK{\ + e) — 114. 



Theorem 4.1. For any < e < 1 and any positive integer n, let k be such 
that 

2 Inn 



Let T he a p ^ k random matrix with i.i.d. standard Gaussian entries. For 
X G R^, define the mapping f : W —> R^ by fix.) = IxF. Then, for any set 
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V of n points in W, such that for any u,v G V, 
2 



P 



1-e 



|u-v|y<||f(u)-f(v)||,<(l + e) 



u — V 



> 1-- 



n^ 



(31) 



A numerical comparison bet ween the bounds obtained by Matousek f lMatousekl . 



20071 ) and Ailon and Chazelle ( lAilon and Chazelld . |2006[ ) and the bound given 
by Theorem 14. II is presented in table [2] and fully discussed in section 5. 



4.1 L2-L1 norm with Achlioptas- typed Random Matrix 



The following Corollary provides an extension to Theorem 14.11 to the case 
where the entries of F are drawn from the Achlioptas types of distribution (eq. 
(^ with g = 1,2, 3). 

Corollary 4.2. For any < e < 1 and any positive integer n, let k he as in 
eq. (15U]) of Theorem \4.1\ Let T he a p ^ k random matrix with i.i.d. entries 
drawn from one of Achlioptas distrihutions (eq. (|1]) with g = 1,2 or ?>). For 
X G R^, define the mapping f : W ^ iV' hy /(x) = ^xF. Then, for any set 
V of n points in W^, such that for any u,v G V, 



u 



viy<iif(u) 



f(v)lk<(l + e: 



u 



> 1-- 



n^ 



(32) 



Note that the lower bound for k using Achlioptas-typed random matrix is 
the same as the lower bound for k using Gaussian random matrix. The proof of 
Corollary 14. 21 follows from Theorem 14 . 1 1 after bounding the moment generating 
function (mgf ) of a Achlioptas-typed random variable by the mgf of a standard 
Gaussian random variable. 



5 Concluding remarks 

All the results considered in the paper were given in terms of the probability 
that the distance between one pair of points is not substantially distorted 
when projected, and a lower bound on this probability was chosen as 1 — 
2/n^. However, in most applications, the user is interested in simultaneously 



11 



preserving distances among all (2) pairs of distinct points selected from V. 
Thus, of interest is a lower bound on the probability of the event 



u,vev 



u 



v|l2<l|fH 



f(v)||2<(l 



u 



(33) 



for example. Since the probability of this event is bounded below by 



1- E^K(i-^)ii- 



v||2<||f(u)-f(v) 



<fl 



u 



n 



u,vey 



where A'^ denotes the complement of A, and since each term in the sum is 
less than 2/n^, then the probability of the event in (l33l) is bounded from 
below by 1/n. It follows that to obtain a better lower bound for the prob- 
ability of the event in ( 15^ using the present techniques, a different bound 
for the probabilities of the event { ||f(u) — f(v)||2 > (1 + e) ||u — v||2 } and 



f(u) — f(y)|L < (1 — e)||u — VII2} must be selected. Thus, Achliopt 



as 



Achlioptad . l200l[ ) introduces a parameter /3 > so that for each pair u, v G \^, 

P[(l-6)(||u-v||2)<||f(u)-f(v)||2<(l + e)(||u-v||2)]>l-2/n2+/^. 



With this choice, the probability of the event in ( 133|) is then seen to be bounded 
from below by 1 — 1/n^. The parameter /3 becomes a fine-tuning parameter 
that affects the probability of the event in fl33|) . Taking the (3 > into account 
in our results, the new expression for the lower bounds for k are as follows: 



(i) Dasgupta and Gupta fJDasgupta and Guptal . Il999l ): k > ■^2 ,23"" 



(ii) Theorem 13.21 k is the smallest even integer satisfying (^^-^^g{k,e) < 
-^+p, where g{k,e) is defined in Lemma [3Tl 



(iii) Mato usek ( iMatousekl . 120071 ) . and Ailon and Chazelle ( Ailon and Chazelld . 
iooeh : k > Ce-2 ((4 + 2/3) Inn) 



(iv) Theorem 14. 1 1 and Corollary 14.21 k > „]„/^ (-","-( , where A(s*) is defined in 
section 4. 



-ln(A(s*))' 



The following tables provide a comparison between the results presented 
here and the results available in the literature. Table [T] gives a comparison of 
the lower bounds for k for various values of n, e and /3 obtained from various 
approaches: Theorem 13. 2[ Dasgupta and Gupta's version of the JL Lemma, 
and exact solution method. The exact solution method numerically finds the 
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smallest integer k such that the sum of the left- and right-tail probabilities, 
i.e. -P[||y|| > en] + -P[||y|| < 02], is less than or equal to 2/72^+^. Note 
that the exact solution method uses directly the sum of the left- and right-tail 
probabilities, whereas Theorem 13.21 provides an intermediate bound for the 
sum of the tail probabilities and then sets the intermediate bound less than or 
equal to 2/n^^^ to obtain the lower bound for k. The random matrix has i.i.d. 
standard Gaussian entries. We see that the lower bound for k using Theorem 
13.21 is very close to the lower bound for k using the exact solution method, 
and significantly improves on the lower bound for k given by Dasgupta and 
Gupta's version of the JL Lemma. The advantage provided by our approach 
is reflected in the additional percentage dimension reduction of at least 13% 
in all cases considered. In some of the cases, we achieve a 30% additional 
reduction in dimension when compared to the Dasgupta and Gupta bound. 

Table [2] compares the lo wer bound for k obtained from Ailon and Chazelle 
(JAilon and Chazelld . l2006l ) and Matousek (JMatousekl . 120071 ) for L2-L1 distance 
[C = 1), and Theorem 14.11 for L2-L1 distance. The random matrix has i.i.d. 
standard Gaussian entries. We observe that the lower bounds for k from Ailon 



and Chazelle (jAilon and Chazelld . 120061 ) and Matousek (JMatousekl . 120071 ) are 
significantly larger than the lower bound for k obtained from Theorem 14.11 
In most cases, the results of Theorem 14.11 provide an additional reduction of 
35% — 40% in the lower bound for k. 
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Table 1: Comparison of the lower bounds for k for L2-L2 distance: exact 
solution (numerically solving for k after setting the sum of left and right-tail 
probabilities equal to 2/n^+'^), Theorem I3.2[ and JL Lemma. 



N(0,1) entries 




exact solution 


Theorem |3^ 


JL Lemma 


n=50 


e = .l,/3 = l 


3776 


3976 


5030 




e = .3, /3 = 1 


456 


494 


653 




e = .l, /3 = 2 


5336 


5572 


6707 




e = .3, /3 = 2 


654 


692 


870 


n=100 


e = .l, /3 = 1 


4601 


4822 


5921 




e = .3, /3 = 1 


561 


598 


768 




e = .l, /3 = 2 


6461 


6716 


7895 




e = .3, /3 = 2 


797 


834 


1024 


n=500 


e = .l, /3 = 1 


6552 


6808 


7991 




e = .3, /3 = 1 


808 


846 


1036 




e = .l, /3 = 2 


9110 


9390 


10654 




e = .3, ^ = 2 


1130 


1168 


1382 


n=1000 


e = .l, /3 = 1 


7403 


7670 


8882 




e = .3, /3 = 1 


916 


954 


1152 




e = .l, /3 = 2 


10262 


10548 


11842 




e = .3, /3 = 2 


1274 


1312 


1536 
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Table 2: N ormal random m atrix: comparison of the lower bounds for k from 
Matousek (JMatousekl . 120071 ) for L2-L1 distance (C = 1), and Theorem 14. II for 
L2-L1 distance. 



N(0,1) entries 


L2 — Li Matousek 


L2 - Li Theorem |4JJ 


n=50 e = .1, (3 = 1 
e = .3, /3 = 1 

e = .l,/3 = 2 
e = . 1,(3 = 2 


2348 
261 

3130 
348 


1398 
168 

1863 
223 


n=100 e = . 1,(3 = 1 
e = .3, /3 = 1 
e = . 1,(3 = 2 
e = . 1,(3 = 2 


2764 
308 

3685 
410 


1645 
197 

2193 
263 


n=500 e = .1, (3 = 1 
e = .3,(3 = l 
e = . 1,(3 = 2 
e = . 1,(3 = 2 


3729 

415 

4972 

553 


2220 
266 

2960 
354 


n=1000 e = .1, (3 = 1 
e = .3,(3 = l 
e = . 1,(3 = 2 
e = . 1,(3 = 2 


4145 
461 

5527 
615 


2468 
296 

3290 
394 
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A Proofs 

Proof of Lemma I3.lt Proving that g{k + 2,e) < g{k,e) is equivalent to 
proving that 

(l + e)e-(i+^)('l + i') <1. (34) 

Observe that (l + ^) < e, and thus, 



'l + e)e-^'+'^(l + ^\ <(l + e)e-^<l 



(35) 



Proof of Theorem 13.21 Part a: Suppose I < d < Xi. Dividing both sides of 
(122]) by (xT^rf) ((feyi)) i* is seen that (^ is equivalent to 

Xi-df d-1 {d-l){d-2l (d-m 

But 



d-i {d-l){d~2) {d-l)\ ,_,f,_,y 

< EtJ {ij (38) 



i-U 



(39) 

where (13 9 p is obtained from the finite geometric sum. 

The inequality in ( 136|) follows immediately from (139|) . 
Proof of Theorem 13.21 Part b: Suppose < X2 < d. Dividing both sides 

of ( 123|) by f ^3^- j i (JLi)\ ) ) (!23|) is seen to be equivalent to 

d — Ao Xo I Xo Xr, \ . . 

But, 



A2 rf V ^+1 {d + l){d + 2) 



< E.=o i^y (42) 

= A ■ (43) 
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Thus, fHOl) follows immediately from f H3|) . 

Proof of Theorem I4.lt Let F be a random matrix of dimension p x k with 
i.i.d entries Vij ~ iV(0, 1). For x G R^, define a linear mapping /: R^ — )■ R'^ 
by /(x) = ixF. Let 



xr, 



%■ 



iV(0,l). 



(44) 



Then, ^(||y||i) = k,/2/^, and M|,^.|(s) = 2e^V2$(3). 



Let tti = k^y2/n{l + e), then the right-tail probability is bounded by 

= ^[l|y|li>«i] 

< ('2e-(^"^/'=)+(^'/2)$(^)\'' ^ ^ > 0. (45) 



|f(x)||, > v^(l + e)||x| 



Let A(s) = e (*"i/^)+(* /^)<l>(s), and denote by s* the minimizer of A, so that 
s* is the solution to 

. = V^(l + 6)-^. (46) 

The second derivative of A{s) with respect of s is taken to ensure that s* is 
the minimizer of A. 



A"{s) = e 
Note that for s > 



(sai/fc)+(s2/2) 



kJ 



1 ^(s) 



s-2^ 
k 



«! 



■'-f) +^>Hf 



"i\ /'/'(■s) 



$(sl 



(47) 



(4J 



Thus, v4"(s) > 0, which implies s* is the unique minimizer of A. Setting 
A{s*) < 1/n'^, we obtain the lower bound for A; to be A; > _i^'^?g*s - 

Similarly, let 0^2 = fcy2/7r(l — e), then left-tail probability is bounded by 



|f(x)||, < v/2/7r(l-e)||x||2 =P[||y||i<a2 



< f2e'^sa2/k)+(sV2) ^^ _ ^^^^ 



Let 

The next proposition provides B{s) < A{s). 



s > . 

(49) 

(50) 
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Proposition A.l. For all ( > 0, we have 



< 



i-*(C)' ''^' 

Proof of Proposition XKaS. Let /(C) = iz^e-^^v^, then eq. ^ is 
equivalent to 

/(C) > 1 (52) 

It suffices to prove that /(C) is an increasing function. Taking the derivative 
of / with respect to C, yields 



/'(C) 






0(C) 



Li-<^(C) 



2v/2/7r<l>(C) 



(53) 



We should note that the ffist term is positive. The ratio ^J_^ic\ is the inverse 
of the Mill's ratio, which is an increasing function, and we observe that 

^^||^ > 2v^<|.(C) (54) 

which implies /'(C) > 0, and hence, / is an increasing function of C- The 
minimum of / is attained when C = 0. In other words, min/(C) = 1, and 

hence eq. (1521) is proven. 

Using Proposition lA. 1 1 with C = -s, -B(s) < A(s) for s > 0. Thus, the left-tail 
probability is bounded by 

k 



P 



|f(x)||, < v^(l-e)||x||2] < (^2e-(*"i/'=)+(^'/2)<I>(s; 



(55) 



Note that the right side of inequality (155|) for the left-tail probability is the 
same as in the case for the right-tail probability. 

Proof of Corollary 14. 2t Let F be a random matrix of dimension p x A; with 
i.i.d entries from an Achlioptas distribution (g = 1, 2 or 3). For x G R^, define 
a linear mapping /: R^ — )■ R*^ by /(x) = -^xF. Let 



xr, 



Vj 



E 



Ci1"ij , 



(56) 



where q = Tifij-, so that X^iLi c? — 1- Then, _E(||y||-^) = k^^jn, and 

^% (^) = n (^ + - (cosh(Qty^) - \)\ , Vt . (57) 

We introduce the following proposition to provide a bound on My^{t). 
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Proposition A. 2. For x G R, and q = 1,2 or 3, we have 

1 + - (cosh (xy^) - 1) < e"'/2 (5^ 



Proof of Proposition IA.2t Our proof will show that g{x 



log(cosh{a;)) 

takes as its maximum value of 1 at a; = 0. By a symmetry argument, we only 
need to consider the case x > and show that g is decreasing in x > 0. 
For q = 1, g{x) = °^^2% is decreasing in x > 0. 

For the cases g = 2 and q = 3, g[x) = — — — ^r/^ — - — -. To prove that g is 
decreasing, we need g'{x) < 0. 

s'i^) = ~lp~fi ( — sinh(xv/g) - x ( 1 + -(cosh(xv/g) " 1) J ) (59) 

with ^'(0) = 0. Let 

h{x) = ^^ sinh(xA/g) — x I 1 H — (cosh(xA/g) — 1) J (60) 

Since X > 0, and /i(0) = 0, if h'{x) < 0, then x = is maximum and h{x) < 0. 
But 

h'{x) = I I {cosh.{x^/q) — 1) — x^^^ sinh(xY^) (61) 

with h'{0) = 0. Let 

/(x) = {q — l)(cosh(xy^) — 1) — x^/qsmh{x^/q) (62) 

then 

r(x) = y/q{q — 2) sinh(xy^) — xgcosh(xyg) (63) 

with l'{0) = 0. For g = 2, we have /'(x) = — 2xcosh(xv^) < 0, which implies 
g{x) is decreasing for x > 0. 
For q = 3, let 

m(x) = /'(x) = v3sinh(xA/3) — 3xcosh(xA/3) (64) 

then m'{x) = — 3-\/3sinh(xv^) < 0, which implies g{x) is decreasing for x > 0. 
Thus. Proposition lA.2l is proven. □ 

Using Proposition Ol for t e R, g = 1, 2 or 3, and Z ~ A^(0, 1), 

My. (^) = n (l + - (^osh (c^^^) - 1)) ^ n ^'^*'^' = ^*'^' = ^^(^) • (65) 
i=i ^ ^ ^ i=i 
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The inequality in fl65|) implies 

M\y^\{t) < M|^|(t) = 2e*'/2<l>(t) , t G R . (66) 



Thus, for ai = k^y2/■K{l + e), the right-tail probability is bounded by 

|f(x)||,>v^(l + 6)||x||J =P[||y||i>ai] 



where the last inequality f l67|) is the same as in the case of Gaussian random 
matrix. 

Similarly, for 02 = A;^2/7r(l — e), the left-tail probability is bounded by 

P [||f(x)||, < v^(l - e) ||x||J = P [\\y\\, < a,] 

< (2e(^-^/'=)M|z|(-s))' 

< j'2e(^"2/'=)+(^'/2) (1 - $(s))V (68) 

< ('2e-(^"^/'^)+(^'/2)<l>(s)V . (69) 

The last inequality is the same as in the case of the right-tail probability, and 
hence, we are done. 
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